Overview of the Patent Translation Task at the NTCIR-7 Workshop
نویسندگان
چکیده
To aid research and development in machine translation, we have produced a test collection for Japanese/English machine translation and performed the Patent Translation Task at the Seventh NTCIR Workshop. To obtain a parallel corpus, we extracted patent documents for the same or related inventions published in Japan and the United States. Our test collection includes approximately 2 000 000 sentence pairs in Japanese and English, which were extracted automatically from our parallel corpus. These sentence pairs can be used to train and evaluate machine translation systems. Our test collection also includes search topics for cross-lingual patent retrieval, which can be used to evaluate the contribution of machine translation to retrieving patent documents across languages. This paper describes our test collection, methods for evaluating machine translation, and evaluation results for research groups participated in our task. Our research is the first significant exploration into utilizing patent information for the evaluation of machine translations.
منابع مشابه
Overview of the Patent Translation Task at the NTCIR-8 Workshop
Motivation Motivation l / • Systematic evaluation in NLP / IR is crucial – Large Test Collections are needed • We have produced test collections for p patent retrieval at NTCIR since 2001 – What is the next task for patent What is the next task for patent information? 2 History of Patent IR at NTCIR y • NTCIR-3 (2001-2002) T h l 2 years of JPO patent li ti * – Technology survey • Applied conven...
متن کاملOverview of the Patent Machine Translation Task at the NTCIR-10 Workshop
This paper gives an overview of the Patent Machine Translation Task (PatentMT) at NTCIR-10 by describing its evaluation methods, test collections, and evaluation results. We organized three patent machine translation subtasks: Chinese to English, Japanese to English, and English to Japanese. For these subtasks, we provided large-scale test collections, including training data, development data ...
متن کاملOverview of the Patent Machine Translation Task at the NTCIR-9 Workshop
This paper gives an overview of the Patent Machine Translation Task (PatentMT) at NTCIR-9 by describing the test collection, evaluation methods, and evaluation results. We organized three patent machine translation subtasks: Chinese to English, Japanese to English, and English to Japanese. For these subtasks, we provided large-scale test collections, including training data, development data an...
متن کاملOverview of the Patent Mining Task at the NTCIR-7 Workshop
This paper introduces the Patent Mining Task of the Seventh TCIR Workshop and the test collections produced in this task. The task’s goal was the classification of research papers written in either Japanese or English in terms of the International Patent Classification (IPC) system, which is a global standard. For this task, 12 participant groups submitted 49 runs. In this paper, we also report...
متن کاملOverview of the Eighth NTCIR Workshop
For the Eighth NTCIR Workshop (NTCIR-8), we selected and organized seven research areas as “tasks” to investigate, test and benchmark newly constructed test collections. These areas are Complex Cross-Lingual Question Answering (CCLQA), Information Retrieval for Question Answering (IR4QA), Geographic and Temporal Search (GeoTime), Multilingual Opinion Analysis (MOAT), Patent Machine Translation ...
متن کامل